Junichi AKITA Hiroaki TAKAGI Takeshi NAGASAKI Masashi TODA Toshio KAWASHIMA Akio KITAGAWA
Rapid eye motion, or so called saccade, is a very quick eye motion which always occurs regardless of our intention. Although the line of sight (LOS) with saccade tracking is expected to be used for a new type of computer-human interface, it is impossible to track it using the conventional video camera, because of its speed which is often up to 600 degrees per second. Vision Chip is an intelligent image sensor which has the photo receptor and the image processing circuitry on a single chip, which can process the acquired image information by keeping its spatial parallelism. It has also the ability of implementing the very compact integrated vision system. In this paper, we describe the vision chip architecture which has the capability of detecting the line of sight from infrared eye image, with the processing speed supporting the saccade tracking. The vision chip described here has the pixel parallel processing architecture, with the node automata for each pixel as image processing. The acquired image is digitized to two flags indicating the Purkinje's image and the pupil by comparators at first. The digitized images are then shrunk, followed by several steps of expanding by node automata located at each pixel. The shrinking process is kept executed until all the pixels disappear, and the pixel disappearing at last indicates the center of the Purkinje's image and the pupil. This disappearing step is detected by the projection circuitry in pixel circuit for fast operation, and the coordinates of the center of the Purkinje's image and the pupil are generated by the simple encoders. We describe the whole architecture of this vision chip, as well as the pixel architecture. We also describe the evaluation of proposed algorithm with numerical simulation, as well as processing speed using FPGA, and improvement in resolution using column parallel architecture.
Kimihiro NISHIO Hiroo YONEZU Yuzo FURUKAWA
A two-dimensional network for motion detection constructed of simple analog circuits was proposed and designed based on the frog visual system. In the frog visual system, the two-dimensional motion of a moving object can be detected by performing simple information processing in the tectum and thalamus of the frog brain. The measured results of the test chip fabricated by a 1.2 µm complementary metal oxide semiconductor (CMOS) process confirmed the correct operation of the basic circuits in the network. The results obtained with the simulation program with integrated circuit emphasis (SPICE) showed that the proposed network can detect the motion direction and velocity of a moving object. Thus, a chip for two-dimensional motion detection was realized using the proposed network.
Kimihiro NISHIO Hiroo YONEZU Yuzo FURUKAWA
A network for the detection of an approaching object with simple-shape recognition is proposed based on lower animal vision. The locust can detect an approaching object through a simple process in the descending contralateral movement detector (DCMD) in the locust brain, by which the approach velocity and direction of the object is determined. The frog can recognize simple shapes through a simple process in the tectum and thalamus in the frog brain. The proposed network is constructed of simple analog complementary metal oxide semiconductor (CMOS) circuits. The integrated circuit of the proposed network is fabricated with the 1.2 µm CMOS process. Measured results for the proposed circuit indicate that the approach velocity and direction of an object can be detected by the output current of the analog circuit based on the DCMD response. The shape of moving objects having simple shapes, such as circles, squares, triangles and rectangles, was recognized using the proposed frog-visual-system-based circuit.
Yuan-Long JEANG Jer-Min JOU Win-Hsien HUANG
In this paper, a methodology based on a mix-mode interconnection architecture is proposed for constructing an application specific network on chip to minimize the total communication time. The proposed architecture uses a globally asynchronous communication network and a locally synchronous bus (or cross-bar or multistage interconnection network MIN). First, a local bus is given for a group of IP cores so that the communications within this local bus can be arranged to be exclusive in time. If the communications of some IP cores should be required to be completed within a given amount of time, then a non-blocking MIN or a crossbar switch should be made for those IP cores instead of a bus. Then, a communication ratio (CR) for each pair of local buses is provided by users, and based on the Huffman coding philosophy, a process is applied to construct a binary tree (BT) with switches on the internal nodes and buses on the leaves. Since the binary tree system is deadlock free (no cycle exists in any path), the router is just a relatively simple and cheap switch. Simulation results show that the proposed methodology and architecture of NOC is better on switching circuit cost and performance than the SPIN and the mesh architecture using our developed deadlock-free router.
Akira YAMAZAKI Fukashi MORISHITA Naoya WATANABE Teruhiko AMANO Masaru HARAGUCHI Hideyuki NODA Atsushi HACHISUKA Katsumi DOSAKA Kazutami ARIMOTO Setsuo WAKE Hideyuki OZAKI Tsutomu YOSHIHARA
The voltage margin of an embedded DRAM's sense operation has been shrinking with the scaling of process technology. A method to estimate this margin would be a key to optimizing the memory array configuration and the size of the sense transistor. In this paper, the voltage margin of the sense operation is theoretically analyzed. The accuracy of the proposed voltage margin model was confirmed on a 0.13-µm eDRAM test chip, and the results of calculation were generally in agreement with the measured results.
A large memory is typically designed with multiple identical memory blocks for reducing delay and power. The circuit verification of individual memory blocks can be effectively handled by the Symbolic Trajectory Evaluation (STE) approach. However, if multiple memory blocks are integrated into a single system, the STE approach cannot verify it economically. This paper introduces algorithms for verifying block-level connectivity of memories. The verification time of a large memory can be reduced drastically by using bottom-up verification scheme. That is, a memory block is first verified thoroughly, and then only the interconnection between memory blocks of the large memory needs to be verified. The proposed verification algorithms require (3n+2(log2n+1)+3log2m) Read/Write operations for a 2nm-bit memory, where n and m are the address width and data width, respectively. Also, the algorithms can verify 100% of the inter-port and intra-port signal misplaced faults of the address, data input, and data output ports.
In this paper, a new computing paradigm suitable for analog circuit systems is described in comparison to the digital circuit systems. The analog circuit systems have some disadvantages especially in terms of accuracy and stability, but there are some applications that don't require accuracy or stability in circuit component. The new computing concept for such applications, 'inaccurate' information processing, or 'rough' information processing, is proposed and described as well as some examples of such applications.
Takeshi FUJINO Akira YAMAZAKI Yasuhiko TAITO Mitsuya KINOSHITA Fukashi MORISHITA Teruhiko AMANO Masaru HARAGUCHI Makoto HATAKENAKA Atsushi AMO Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
A low power 16 Mb embedded DRAM (eDRAM) macro is fabricated using 0.15 µm logic -based embedded DRAM process technology. A 0.5 µm2 CUB (
Tetsuya ASAI Yuusaku NISHIMIYA Yoshihito AMEMIYA
The Belousov-Zhabotinsky (BZ) reaction provides us important clues in controlling 2D phase-lagged stable synchronous patterns in an excitable medium. Because of the difficulty in computing reaction-diffusion systems in large systems using conventional digital processors, we here propose a cellular-automaton (CA) circuit that emulates the BZ reaction. In the circuit, a two-dimensional array of parallel processing cells is responsible for fast emulation, and its operation rate is independent of the system size. The operations of the proposed CA circuit were demonstrated by using a simulation program with integrated circuit emphasis (SPICE).
Akira YAMAZAKI Takeshi FUJINO Kazunari INOUE Isamu HAYASHI Hideyuki NODA Naoya WATANABE Fukashi MORISHITA Katsumi DOSAKA Yoshikazu MOROOKA Shinya SOEDA Kazutami ARIMOTO Setsuo WAKE Kazuyasu FUJISHIMA Hideyuki OZAKI
A 23.3 mm2 32 Mb embedded DRAM (eDRAM) macro has been fabricated using 0.18 µm triple-well 4-metal embedded DRAM process technology to realize an accelerated 3-D graphics controller. The array architecture, using a dual-port sense amplifier, achieves the column access latency of two cycles at 222 MHz and a peak data rate of 14.2 4 GB/s at 4 macros. The process cost has been kept low by using VT-MOS circuit technology and taking advantage of a characteristic of dual-gate oxide process technology. A tRAC of 11.6 ns at 2.0 V is achieved using a 'pre-detect redundancy' circuit.
Hiroyuki KURINO Yoshihiro NAKAGAWA Tomonori NAKAMURA Yusuke YAMADA Kang-Wook LEE Mitsumasa KOYANAGI
The smart vision chip has a large potential for application in general purpose high speed image processing systems. In order to fabricate smart vision chips including photo detector compactly, we have proposed the application of three dimensional LSI technology for smart vision chips. Three dimensional technology has great potential to realize new biologically inspired systems inspired by not only the biological function but also the biological structure. In this paper, we describe our three dimensional LSI technology for biologically inspired circuits and the design of smart vision chips.
This paper gives a detailed presentation of a "vision chip" for a very fast detection of motion vectors. The chip's design consists of a parallel pixel array and column parallel block-matching processors. Each pixel of the pixel array contains a photo detector, an edge detector and 4 bits of memory. In the detection of motion vectors, first, the gray level image is binarized by the edge detector and subsequently the binary edge data is used in the block matching processor. The block-matching takes place locally in pixel and globally in column. The chip can create a dense field of motion where a vector is assigned to each pixel by overlapping 2 2 target blocks. A prototype with 16 16 pixels and four block-matching processors has been designed and implemented. Preliminary results obtained by the prototype are shown.
Computational sensor (smart sensor, vision chip in other words) is a very small integrated system, in which processing and sensing are unified on a single VLSI chip. It is designed for a specific targeted application. Research activities of computational sensor are described in this paper. There have been quite a few proposals and implementations in computational sensors. Firstly, their approaches are summarized from several points of view, such as advantage vs. disadvantage, neural vs. functional, architecture, analog vs. digital, local vs. global processing, imaging vs. processing, new processing paradigms. Then, several examples are introduced which are spatial processings, temporal processings, A/D conversions, programmable computational sensors. Finally, the paper is concluded.
Haruo KOBAYASHI Takashi MATSUMOTO
There are two dynamics issues in vision chips: (i) The temporal dynamics issue due to the parasitic capacitors in a CMOS chip, and (ii) the spatial dynamics issue due to the regular array of processing elements in a chip. These issues are discussed in [1]-[3] for the resistor network with only associated parasitic capacitances. However, in this paper we consider also parasitic inductances as well as parasitic capacitances for a more precise network dynamics model. We show that in some cases the temporal stability condition for the network with parasitic inductances and capacitances is equivalent to that for the network with only parasitic capacitances, but in general they are not equivalent. We also show that the spatial stability conditions are equivalent in both cases.
Akira YAMAZAKI Tadato YAMAGATA Yutaka ARITA Makoto TANIGUCHI Michihiro YAMADA
The features for the integration of 1Tr/1C DRAM and logic for graphic and multimedia applications are surveyed. The key circuit/process technology for large scale embedded DRAM cores is described. The methods to improve transistor performance and gate density are shown. Noise immunity design and easy customization techniques are also introduced.
Barry SHACKLEFORD Mitsuhiro YASUDA Etsuko OKUSHI Hisao KOIZUMI Hiroyuki TOMIYAMA Akihiko INOUE Hiroto YASUURA
Entire systems embedded in a chip and consisting of a processor, memory, and system-specific peripheral hardware are now commonly contained in commodity electronic devices. Cost minimization of these systems is of paramount economic importance to manufactures of these devices. By employing a variable configuration processor in conjunction with a multi-precision compiler generator, we show that there are situations in which considerable system cost reduction can be obtained by synthesizing a CPU that is narrower than the largest variable in the application program.
Satoshi MATSUMOTO Masato MINO Toshiaki YACHI
Integrating the power supply and signal processing circuit into one chip is an important step towards achieving a system-on-chip. This paper reviews and looks at the current technologies and their trends for power supply components such as DC-DC converters, intelligent power LSIs, and thin-film magnetic devices for the system-on-chip. A device structure has been proposed for the system-on-chip that is based on a quasi-SOI technique, in which the buried oxide layer is partially removed from the SOI substrate. In this structure, the CMOS devices for the digital signal-processing circuit and the bipolar transistors are formed in a conventional SOI region, and the CMOS analog devices and high-voltage devices are formed in a quasi-SOI region.
Toshiro TSUKADA Keiko Makie-FUKUDA
Digital-switching noise coupled into sensitive analog circuits is a critical problem in large-scale integration of mixed analog and digital circuits. This paper describes noise coupling of this kind, especially, through the substrate in CMOS integrated circuits, and reviews recent technical solutions to this noise problem. Simplified models have been developed to simulate the substrate coupling rapidly and accurately. A method using a CMOS comparator was proposed for measuring the effects of substrate noise, and equivalent waveforms of actual substrate noise were obtained. A circuit tecnique, called active guard band filtering, that controls the noise source is a new approach to substrate noise decoupling. CAD methods for handling substrate-coupled switching noise are making design verification possible for practical mixed-signal LSIs.
Masakatsu MARUYAMA Hiroyuki NAKAHIRA Shiro SAKIYAMA Toshiyuki KOHDA Susumu MARUNO Yasuharu SHIMEKI
This paper discusses a digital neuroprocessor named Quantizer Neuron Chip (QNC) employing the Quantizer Neuron model and two newly developed schemes; "concurrent processing of quantizer neuron" and "removal of ineffective calculations". QNC simulates neural networks named the Multi-Functional Layered Network (MFLN) with 64 output neurons, 4672 quantizer neurons and two million synaptic weights and can be used for character or image recognition and learning. The processing speed of the chip achieved 1.6 µseconds per output neuron for recognition and 20 million connections updated per second (MCUPS) for learning. In addition, QNC can execute multichip operation for increasing the size of networks. We applied QNC to handwritten numeral recognition and realized high speed recognition and learning. QNC is implemented in a 1.2 µm double metal CMOS with sea of gates' technology and contains 27,000 gates on a 10.9910.93 mm2 chip.
Luigi RAFFO Silvio P. SABATINI Giacomo INDIVERI Giovanni NATERI Giacomo M. BISIO
The paper describes the architecture and the simulated performances of a memory-based chip that emulates human cortical processing in early visual tasks, such as texture segregation. The featural elements present in an image are extracted by a convolution block and subsequently processed by the cortical chip, whose neurons, organized into three layers, gain relational descriptions (intelligent processing) through recurrent inhibitory/excitatory interactions between both inter-and intra-layer parallel pathways. The digital implementation of this architecuture directly maps the set of equations determining the status of the cortical network to achieve an optimal exploitation of VLSI technology in neural computation. Neurons are mapped into a memory matrix whose elements are updated through a programmable computational unit that implements synaptic interconnections. By using 0.5 µm-CMOS technology, full cortical image processing can be attained on a single chip (2020 mm2 die) at a rate higher than 70 frames/second, for images of 256256 pixels.